HDDS-15601. Coalesce contiguous container IDs into range deletes in ContainerHealthSchemaManager#10549
HDDS-15601. Coalesce contiguous container IDs into range deletes in ContainerHealthSchemaManager#10549umesh9794 wants to merge 2 commits into
Conversation
…ontainerHealthSchemaManager
ContainerHealthSchemaManager.deleteScmStatesForContainers (used by the Recon
container health scan via replaceUnhealthyContainerRecordsAtomically) previously
issued one chunked DELETE ... WHERE container_id IN (...) per 1,000 IDs,
regardless of how the IDs were distributed. For large contiguous ID ranges this
generated many statements where a single range delete would suffice.
Sort and de-duplicate the IDs, then split them into maximal contiguous runs:
- runs of at least MIN_RANGE_DELETE_RUN (one IN-chunk worth) are removed with a
single constant-size DELETE ... WHERE container_id BETWEEN lo AND hi
- shorter runs and scattered IDs continue to use bytecode-safe chunked IN deletes
This keeps the common production case (dense ID ranges with gaps from deleted
containers) to a handful of statements while never doing worse than plain IN
chunking for sparse IDs. The 7-state SCM filter is applied on both paths, so
non-SCM states such as ALL_REPLICAS_BAD are preserved as before.
Add TestContainerHealthSchemaManager covering contiguous, scattered, mixed, and
unsorted/duplicate ID inputs, and asserting non-SCM states inside a deleted range
are retained.
|
@devmadhuu @adoroszlai please take a look at this PR. Its part of changes suggested in #10532 10532 |
|
Thanks @umesh9794 for the patch. Next time please keep the original PR instead of opening a new one. Now discussion will be split between two PRs. |
| totalDeleted += deleteScmStatesByRange(dslContext, | ||
| sortedIds.get(runStart), sortedIds.get(i)); |
There was a problem hiding this comment.
What about the suggestion to include multiple BETWEEN ranges in the same delete statement?
There was a problem hiding this comment.
I was trying to keep the DELETE statements with BETWEEN and IN clause separate to issue 2 DELETE statements. So in the worst case it falls back to the same IN chunking the code used before this change, so it's never worse than the old baseline of ceil(n / 1000). In best case it will issue just 1 statement for a fully contiguous block.
As the comments says above combining both might overflow Derby's per-statement bytecode limit (Bytecode per method).
If you prefer, I can combine it both into one. Thanks!
There was a problem hiding this comment.
Using BETWEEN instead of IN for contiguous range of IDs is a good idea.
But I don't think the current change has much effect on production behavior, a range of 1000+ "unhealthy" containers in Recon is not real life scenario. So it adds complexity without benefit.
We should apply this logic for shorter ranges, without executing too many small statements. For that, we need to find the threshold (number of container IDs) over which BETWEEN n AND n+k OR yields shorter code than the corresponding IN clause. Any number of IDs beyond that is a win.
There was a problem hiding this comment.
Sure, let me apply that change here, thanks.
There was a problem hiding this comment.
Using
BETWEENinstead ofINfor contiguous range of IDs is a good idea.But I don't think the current change has much effect on production behavior, a range of 1000+ "unhealthy" containers in Recon is not real life scenario. So it adds complexity without benefit.
We should apply this logic for shorter ranges, without executing too many small statements. For that, we need to find the threshold (number of container IDs) over which
BETWEEN n AND n+k ORyields shorter code than the correspondingINclause. Any number of IDs beyond that is a win.
@adoroszlai I have applied the changes, please take a look. Thanks!
I agree, actually that was my mistake. I reverted the commit and pushed in the same branch which accidentally closed the PR which was unintentional. |
…rHealthSchemaManager Coalesce runs of 3+ IDs into combined BETWEEN/IN predicates bounded by an operand budget, and extend TestContainerHealthSchemaManager to cover combined and multi-statement deletes.
devmadhuu
left a comment
There was a problem hiding this comment.
Thanks @umesh9794 for the patch. Functionally by seeing code, looks fine, but we need to consider few real non-functional issues which can break things in production cluster:
- Earlier old code was safely putting 1000 Ids under
INclause, so it was tested and known that it will not bring Derby's 64KB limit issue as all 1000 Ids were used to stay same in count and predicted with size of compiled Derby byte code. I mean , it was assuring that keep each SQL command small enough that Derby's secret program stays under 64 KB. Question is now is this new logic brings that safety when at run time it will determine the number of container ids with mix n match of IN and BETWEEN ? I have a doubt about that and in that case Derby may crash. - The new derby statement formed in a more complicated way: Lets take an analogy: "
find everyone in the range 1–100, OR everyone in range 150–2000, OR these specific scattered container ids." When a database sees a question full of "OR ... OR ... OR," it sometimes gives up on using the index and instead just reads the entire table from row to row to be safe. That's called a full table scan, and it's much slower.
On top of being slower, this delete runs inside a transaction where the database locks rows while it works. If it scans the whole table instead of jumping to specific rows, it may lock far more than it needs to, which can slow down or block other things happening at the same time.
So the change that was meant to make things faster could, depending on how Derby decides to run it, actually make the delete slower and heavier. There's an existing performance test suite that would reveal this, but the PR doesn't mention running it — it only mentions the new correctness test.
3. The whole optimization only helps when IDs are next to each other (like 100, 101, 102, 103...). That's when you can collapse them into a tidy range.
But in real life, unhealthy container IDs are usually scattered (like 14, 88, 230, 911...). The PR's own description even admits it's aiming at "fragmented" data. Here's the irony: scattered data has no runs to collapse. So the clever new logic finds nothing to optimize and falls back to behaving like the old code anyway. So in the common case, you've added cost and effort for zero benefit. If you have real production kind of test which can prove with figures, please publish those stats.
- The old version of this logic was about 15 lines and dead simple: "chop the list into groups of 1,000 and delete each group." Boring, but easy to read, easy to trust, and it already fully solved the only real requirement (stay under Derby's limit).
The new version is about 100 lines with a genuinely clever algorithm (detecting runs, mixing ranges and lists, tracking an operand budget, flushing batches). Clever code is not free — it's harder to read, harder to test, and easier to break later when someone else touches it.
What changes were proposed in this pull request?
Coalesce contiguous container IDs into range deletes in ContainerHealthSchemaManager
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15601
How was this patch tested?
Added unit test to cover the code changes